Interactive Data Visualization


Chester Ismay (cismay@reed.edu)

Paideia 2k16

Slides available at http://rpubs.com/cismay/paideia_2k16_idv

The Iris flower data set

  • Introduced by Ronald Fisher in 1936

  • The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor).

  • Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features, Fisher developed a model to distinguish the species from each other.

Source: Wikipedia

Scatterplots

Traditional (boring) plot

with(iris, plot(x = Petal.Width, y = Sepal.Length))

Prettier (not quite as boring) plot

qplot(Petal.Width, Sepal.Length, data = iris)

Interactive plot using plotly

ggiris <- qplot(Petal.Width, Sepal.Length, data = iris)
ggplotly(ggiris)

Prettier interactive plot using plotly

ggiris_colored <- qplot(Petal.Width, Sepal.Length, data = iris, 
  color = Species)
ggplotly(ggiris_colored)

Another interactive plot

iris %>% plot_ly(x = Petal.Width, y = Sepal.Length,
  type = "scatter", color = Species, mode = "markers")

Scatterplots (Part Deux)

Reed College majors VS FTE by department

  • Based off analysis done by Rich Majerus in 2014 using the googleVis package

  • Data does not include 94 interdisciplinary majors and 40 undecided majors.

  • Majors like Bio/Chem are split between the two departments

  • General Lit majors are included with English

  • Dance majors and faculty are included with Theatre

major_data %>% ggplot(aes(x = Majors, y = FTE)) +
  geom_point() +
  ggtitle("Reed College Majors and FTE by Department")

Left-click and drag to select an area of the chart to zoom on. Right-click to zoom back out.

Alaskan departure delays in PNW

  • The pnwflights14 package provides information contains information about all flights that departed from SEA in Seattle and PDX in Portland, in 2014: 162,049 flights in total.

  • We can use this data and the dplyr package to look at daily maximum departure delays throughout the year for Alaskan Airlines.

Time series/line graphs

alaskan %>% ggplot(aes(x = date2014, y = max_dep_delay)) +
  geom_line() +
  scale_x_date(date_breaks = "1 month", date_labels = "%b %y") +
  xlab("Date") +
  ylab("Maximum Departure Delay")

ggplotly()

Canadian and US population and geography

  • Canada is an extremely large land mass (2nd largest country in the world), but is only the 37th largest country in terms of population

  • The US ranks 4th highest in land mass and 3rd highest in population

  • We can use data in the maps package to better visualize why these rankings exist

Maps

data(canada.cities, package = "maps")
canada_plot <- ggplot(canada.cities, aes(x = long, y = lat)) +
  coord_equal() +
  geom_point(aes(size=pop, text = paste0(name, ",",
    "Pop: ", prettyNum(pop, big.mark = ",", scientific = FALSE))), 
    colour = "red", alpha = 1/2) +
  borders(regions="canada")
canada_plot

ggplotly(canada_plot)

data(us.cities, package = "maps")
us_plot <- ggplot(us.cities, aes(x = long, y = lat)) +
  coord_equal() +
  geom_point(aes(size=pop, text = paste0(name, ",",
    "Pop: ", prettyNum(pop, big.mark = ",", scientific = FALSE))), 
    colour = "red", alpha = 1/2) +
  borders(regions="usa", xlim = c(-200, -60), ylim = c(20, 80))
us_plot

ggplotly(us_plot)

3D objects

plot_ly(z = volcano, type = "surface")

Interactive Data Tables

datatable(iris, options = list(pageLength = 5))

Another data table example

RA Duty Scheduling

Other resources

Plotting maps in R with ggplot2

HTML Widgets for R

Leaflet package for R

GapMinder (now owned by Google)

Hans Rosling’s TED talk - “The Best Stats You’ve Ever Seen”

What can I help you with?

  • Data analysis
  • Data wrangling/cleaning
  • Data visualization
  • Data tidying/manipulating
  • Reproducible research

When am I available?

  • Email me at cismay@reed.edu or chester.ismay@reed.edu to schedule a time to meet if office hours don’t work
  • Tentative Spring 2016 office (ETC 223) hours
    • Mondays (10 AM to 11 AM)
    • Tuesdays (2 PM to 3 PM)
    • Wednesdays (1:30 PM to 2:30 PM)
  • Sometimes available for virtual office hours via Google Hangouts (email me for details)

Thanks!


cismay@reed.edu